Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design
نویسندگان
چکیده
The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block floating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data traffics efficiently, while maintaining the classification accuracy. In this paper, we verify the effects of word width definitions in BFP to the CNN performance without retraining. Several typical CNN models, including VGG16, ResNet-18, ResNet-50 and GoogLeNet, were tested in this paper. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0.3% accuracy loss. In addition, we investigate the computational errors in theory and develop the noise-to-signal ratio (NSR) upper bound, which provides the promising guidance for BFP based CNN engine design.
منابع مشابه
Improving accuracy of Winograd convolution for DNNs
Modern deep neural networks (DNNs) spend a large amount of their execution time computing convolutions. Winograd's minimal algorithm for small convolutions can greatly reduce the number of arithmetic operations. However, a large reduction in floating point (FP) operations in these algorithms can result in significantly reduced FP accuracy of the result. In this paper we propose several methods ...
متن کاملHardware-oriented Approximation of Convolutional Neural Networks
High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model app...
متن کاملBlock Floating Point Interval ALU for Digital Signal Processing
Numerical analysis on real numbers is performed using either pointwise arithmetic or interval arithmetic. Today, interval analysis is a mature discipline and finds use not only in error analysis but also control applications such as robotics. The interval arithmetic hardware architecture proposed in the work [W. Edmonson, R. Gupte, J. Gianchandani, S. Ocloo, W. Alexander, Interval arithmetic lo...
متن کاملAn FPGA-Based Face Detector Using Neural Network and a Scalable Floating Point Unit
The study implemented an FPGA-based face detector using Neural Networks and a scalable Floating Point arithmetic Unit (FPU). The FPU provides dynamic range and reduces the bit of the arithmetic unit more than fixed point method does. These features led to reduction in the memory so that it is efficient for neural networks system with large size data bits. The arithmetic unit occupies 39~45% of ...
متن کاملFlexible On-chip Memory Architecture for DCNN Accelerators
Recent studies show that as the depth of Convolution Neural Networks (CNNs) increases for higher performance in different machine learning tasks, a major bottleneck in the improvement of deep CNNs (DCNNs) processing is the traffic between the accelerator and off-chip memory. However, current state-of-the-art accelerators cannot effectively reduce off-chip feature map traffic due to the limited ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.07776 شماره
صفحات -
تاریخ انتشار 2017